Reverse Engineering Top-k Database Queries with PALEO

نویسندگان

  • Kiril Panev
  • Sebastian Michel
چکیده

Ranked lists are an essential methodology to succinctly summarize outstanding items, computed over database tables or crowdsourced in dedicated websites. In this work, we address the problem of reverse engineering top-k queries over a database, that is, given a relation R and a sample topk result list, our approach, named PALEO, aims at determining an SQL query that returns the provided input result when executed over R. The core problem consists of finding predicates of the where clause that return the given items, determining the correct ranking criteria, and to evaluate the most promising candidate queries first. To capture cases where only a sample of R is available or when R is different to the relation that indeed generated the input, we put forward a probabilistic model that allows assessing the chance of a query to output tuples that are resembling or are somewhat close to the input data. We further propose an iterative candidate query execution to further eliminate unpromising queries before being executed. We report on the results of a comprehensive performance evaluation using data and queries of the TPC-H and SSB [14] benchmarks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reverse Engineering Top-k Join Queries

Ranked lists have become a fundamental tool to represent the most important items taken from a large collection of data. Search engines, sports leagues and e-commerce platforms present their results, most successful teams and most popular items in a concise and structured way by making use of ranked lists. This paper introduces the PALEO-J framework which is able to reconstruct top-k database q...

متن کامل

Exploring Databases via Reverse Engineering Ranking Queries with PALEO

A novel approach to explore databases using ranked lists is demonstrated. Working with ranked lists, capturing the relative performance of entities, is a very intuitive and widely applicable concept. Users can post lists of entities for which explanatory SQL queries and full result lists are returned. By refining the input, the results, or the queries, user can interactively explore the databas...

متن کامل

Identifying the Most Influential Data Objects with Reverse Top-k Queries

Top-k queries are widely applied for retrieving a ranked set of the k most interesting objects based on the individual user preferences. As an example, in online marketplaces, customers (users) typically seek a ranked set of products (objects) that satisfy their needs. Reversing top-k queries leads to a query type that instead returns the set of customers that find a product appealing (it belon...

متن کامل

Estimation of Potential Product Using Reverse Top-k Queries

Atpresent, most of the applications return to the user a limited set of ranked results based on the individual user’s preferences, which are commonly validated through top-k queries. From the perspective of a manufacturer, it is imperative that the products appear in the highest ranked positions for many different user preferences, otherwise the product is not visible to the potential customers...

متن کامل

Answering Why-not Questions on Reverse Top-k Queries

Why-not questions, which aim to seek clarifications on the missing tuples for query results, have recently received considerable attention from the database community. In this paper, we systematically explore why-not questions on reverse top-k queries, owing to its importance in multi-criteria decision making. Given an initial reverse top-k query and a missing/why-not weighting vector set Wm th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016